Skip to content

Proximity Generators Definitions #430

Draft
katyhr wants to merge 28 commits intomainfrom
Katy/ProximityGenerators
Draft

Proximity Generators Definitions #430
katyhr wants to merge 28 commits intomainfrom
Katy/ProximityGenerators

Conversation

@katyhr
Copy link
Copy Markdown
Collaborator

@katyhr katyhr commented Mar 26, 2026

No description provided.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 26, 2026

🤖 Gemini PR Summary

Note: The PR title "Proximity Generators Definitions" is significantly narrower than the scope of the changes described in the summary, which encompass broader coding theory, polynomial lemmas, and refinement of the Johnson bound.

Mathematical Formalization

  • Coding Theory:
    • Formalizes MDS codes and matrices, including the theorem that a linear code is MDS if and only if its generator matrix is MDS.
    • Formalizes word/code projections and the relationship between linear code dimension and generator matrix rank.
    • Proves isMDS_code for Reed-Solomon codes and refines their dimension theory.
  • Proximity Generators: Defines zero-evading, polynomial, MDS, and mutual correlated agreement (MCA) generators for proximity gap analysis.
  • Polynomial Theory:
    • Adds counting and probabilistic versions of the Schwartz-Zippel lemma for multivariate polynomials.
    • Introduces maxTotalDegree for finite polynomial families and establishes lemmas for the degree of their linear combinations and evaluations.

Refactoring and Rigor

  • Geometric Abstractions: Replaces explicit summations in correlated agreement definitions for affine spaces and curves with higher-level sampling abstractions using Curve.polynomialCurveFinite and affineSubspaceAtOrigin.
  • Johnson Bound: Enhances the formalization through explicit subcase analysis and consistent utilization of Fintype.card.
  • Theorem Signatures: Updates key signatures in BCIKS20 proximity gap modules to use implicit parameters for function sequences.

Proof Completion

  • All proofs are complete. No sorry or admit placeholders were introduced.

Infrastructure

  • Library Integration: Updates ArkLib.lean to export new modules for MDS codes, proximity generators, and Schwartz-Zippel variants.
  • Documentation: Adds bibliographic citations and descriptive documentation to coding theory modules.

Statistics

Metric Count
📝 Files Changed 12
Lines Added 1364
Lines Removed 419

Lean Declarations

✏️ **Removed:** 3 declaration(s)
  • lemma dim_eq_card_of_lt {ι : Type*} [Fintype ι] {F : Type*} [Field F] [DecidableEq F] in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • lemma rateOfLinearCode_eq_min_div in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • theorem dim_eq_min_deg_card {ι : Type*} [Fintype ι] {F : Type*} [Field F] [DecidableEq F] in ArkLib/Data/CodingTheory/ReedSolomon.lean
✏️ **Added:** 38 declaration(s)
  • lemma minSeedCard_pos {F : Type} {s : ℕ} (S : Fin s → Set F) in ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean
  • lemma minSeedCard_le {F : Type} {s : ℕ} (S : Fin s → Set F) in ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean
  • lemma prob_eval_zero_le_div in ArkLib/Data/MvPolynomial/SchwartzZippelCounting.lean
  • lemma eq_fromRowGenMat_matrixFromBasis [Field F] (LC : LinearCode ι F) : in ArkLib/Data/CodingTheory/Basic.lean
  • def projectedCode [Fintype ι] (C : Set (ι → F)) (T : Finset ι) : Set (T → F) in ArkLib/Data/CodingTheory/Basic.lean
  • def IsMDS {ι : Type} [Fintype ι] [CommRing F] [DecidableEq F] (LC : LinearCode ι F) : Prop in ArkLib/Data/CodingTheory/Basic.lean
  • lemma rank_genMatrix_eq_dim [Field F] (LC : LinearCode ι F) : in ArkLib/Data/CodingTheory/Basic.lean
  • lemma error_in_unit_interval (d : ℕ) (m : ℕ) (hm_pos : 0 < m) (hdm : d ≤ m) : (d / m : ℝ) ∈ I in ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean
  • def IsMDSGenerator {S : Type} [Nonempty S] [Fintype S] [DecidableEq F] (G : Generator S ℓ F) : in ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean
  • theorem schwartz_zippel_counting in ArkLib/Data/MvPolynomial/SchwartzZippelCounting.lean
  • def projectedWord [Fintype ι] (c : ι → F) (T : Finset ι) : T → F in ArkLib/Data/CodingTheory/Basic.lean
  • def Generator (S ℓ F : Type) : Type in ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean
  • lemma matrix_IsMDS_of_IsMDS [Field F] [DecidableEq F] {G : Matrix (Fin k) (Fin n) F} in ArkLib/Data/CodingTheory/MDSCode.lean
  • def IsMCAGenerator {S : Type} [Nonempty S] [Fintype S] (G : Generator S ℓ F) (ε_mca : I → I) in ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean
  • lemma uniform_prob_eq_card_div {α : Type} [Fintype α] [Nonempty α] in ArkLib/Data/MvPolynomial/SchwartzZippelCounting.lean
  • def Matrix.IsMDS [CommRing F] (G : Matrix (Fin k) (Fin n) F) : Prop in ArkLib/Data/CodingTheory/MDSCode.lean
  • def IsMDS [CommRing F] (G : Matrix (Fin k) (Fin n) F) : Prop in ArkLib/Data/CodingTheory/MDSCode.lean
  • def M_G {S : Type} [Nonempty S] [Fintype S] (G : Generator S ℓ F) : Matrix S ℓ F in ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean
  • lemma vecMul_injective_of_rank_eq [Field F] {G : Matrix (Fin k) (Fin n) F} (hrank : G.rank = k) : in ArkLib/Data/CodingTheory/MDSCode.lean
  • theorem totalDegree_linearCombination_le in ArkLib/Data/MvPolynomial/Degrees.lean
  • lemma dim_fromRowGenMat {k n : ℕ} [Field F] {G : Matrix (Fin k) (Fin n) F} : in ArkLib/Data/CodingTheory/Basic.lean
  • def IsZeroEvadingGenerator {S : Type} [Nonempty S] [Fintype S] (G : Generator S ℓ F) (ε_ze : I) : in ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean
  • lemma ENNReal.div_le_div_of_mul_le {k n d m : ℕ} in ArkLib/Data/MvPolynomial/SchwartzZippelCounting.lean
  • lemma isMDS_code {ι : Type} [Fintype ι] [DecidableEq ι] {F : Type*} [Field F] [DecidableEq F] in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • theorem linearCombination_ne_zero in ArkLib/Data/CodingTheory/Prelims.lean
  • lemma card_filter_eval_subtype_eq_piFinset in ArkLib/Data/MvPolynomial/SchwartzZippelCounting.lean
  • def IsPolynomialGenerator {s : ℕ} (S : Fin s → Set F) (G : Generator (∀ i, S i) ℓ F) : Prop in ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean
  • lemma eq_span_rows [Field F] (LC : LinearCode ι F) : in ArkLib/Data/CodingTheory/Basic.lean
  • def IsMCA {S : Type} [Nonempty S] [Fintype S] (G : Generator S ℓ F) (LC : LinearCode ι F) in ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean
  • lemma gen_matrix_exists [Field F] (LC : LinearCode ι F) : in ArkLib/Data/CodingTheory/Basic.lean
  • lemma linear_code_is_FG [Field F] (LC : LinearCode ι F) : LC.FG in ArkLib/Data/CodingTheory/Basic.lean
  • theorem remark_3_20 in ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean
  • lemma IsMDS_of_matrix_IsMDS [Field F] [DecidableEq F] {G : Matrix (Fin k) (Fin n) F} in ArkLib/Data/CodingTheory/MDSCode.lean
  • lemma colRank_genMatrix_eq_dim_of_MDS [Field F] [DecidableEq F] in ArkLib/Data/CodingTheory/MDSCode.lean
  • def IsPolynomialGeneratorOf {s : ℕ} (S : Fin s → Set F) (G : Generator (∀ i, S i) ℓ F) in ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean
  • lemma minWt_ge_of_MDS [Field F] [DecidableEq F] {G : Matrix (Fin k) (Fin n) F} in ArkLib/Data/CodingTheory/MDSCode.lean
  • theorem dotProduct_eq_eval_linearCombination in ArkLib/Data/MvPolynomial/Degrees.lean
  • lemma pmf_prob_le_one {α : Type} [Fintype α] [Nonempty α] in ArkLib/Data/MvPolynomial/SchwartzZippelCounting.lean
✏️ **Affected:** 7 declaration(s) (line number changed)
  • theorem correlatedAgreement_affine_spaces {k : ℕ} [NeZero k] {u : Fin (k + 1) → ι → F} in ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/AffineSpaces.lean moved from L32 to L32
  • def neqCols [DecidableEq F] (U V : Matrix ι ι' F) : Finset ι' in ArkLib/Data/CodingTheory/Prelims.lean moved from L43 to L47
  • theorem johnson_bound_alphabet_free [Field F] [DecidableEq F] in ArkLib/Data/CodingTheory/JohnsonBound/Basic.lean moved from L307 to L322
  • lemma rateOfLinearCode_eq_div' {ι : Type*} [Fintype ι] {F : Type*} [Field F] [DecidableEq F] in ArkLib/Data/CodingTheory/ReedSolomon.lean moved from L378 to L304
  • lemma sqrt_le_J {q δ : ℚ} (hq : q > 1) (hx0 : 0 ≤ δ) (hx1 : δ ≤ 1) (hqx : q / (q - 1) * δ ≤ 1) : in ArkLib/Data/CodingTheory/JohnsonBound/Basic.lean moved from L63 to L64
  • theorem correlatedAgreement_affine_curves {k : ℕ} {u : Fin k → ι → F} in ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/Curves.lean moved from L27 to L27
  • lemma rank_eq_dim_fromColGenMat [CommRing F] {G : Matrix κ ι F} : in ArkLib/Data/CodingTheory/Basic.lean moved from L2010 to L2149

sorry Tracking

  • No sorrys were added, removed, or affected.

🎨 **Style Guide Adherence**

There are more than 20 violations of the style guide. The violations are grouped by rule below:

  • Naming Conventions: Theorems and Proofs MUST use snake_case. (12 violations)

    • rank_genMatrix_eq_dim (ArkLib/Data/CodingTheory/Basic.lean:2109)
    • dim_fromRowGenMat (ArkLib/Data/CodingTheory/Basic.lean:2118)
    • minWt_ge_of_MDS (ArkLib/Data/CodingTheory/MDSCode.lean:37)
    • Rule: "Theorems and Proofs: snake_case (e.g., add_comm, list_reverse_id)."
  • Naming Conventions: Functions and Terms MUST use lowerCamelCase. (10 violations)

    • JohnsonDenominator (ArkLib/Data/CodingTheory/JohnsonBound/Basic.lean:22)
    • JohnsonConditionStrong (ArkLib/Data/CodingTheory/JohnsonBound/Basic.lean:38)
    • Generator (ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean:57)
    • Rule: "Functions and Terms: lowerCamelCase (e.g., binarySearch, isPrime)."
  • Naming Conventions: Acronyms MUST be treated as words. (15 violations)

    • IsMDS (ArkLib/Data/CodingTheory/Basic.lean:2031)
    • hMDS (ArkLib/Data/CodingTheory/MDSCode.lean:40)
    • IsMCA (ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean:90)
    • Rule: "Acronyms: Treat as words (e.g., HtmlParser not HTMLParser)."
  • Syntax and Formatting: Avoid using ; to separate tactics or terminate lines in multiline blocks. (38 violations)

    • unfold fromRowGenMat; (ArkLib/Data/CodingTheory/Basic.lean:2121)
    • apply Matrix.rank_eq_if_det_ne_zero; (ArkLib/Data/CodingTheory/MDSCode.lean:134)
    • convert uniform_prob_eq_card_div _; (ArkLib/Data/MvPolynomial/SchwartzZippelCounting.lean:118)
    • Rule: "Avoid using ; to separate tactics unless writing short, single-line tactic sequences."
  • Syntax and Formatting: Tactic block by placement. (16 violations)

    • dim (fromRowGenMat G) = G.rank := by (ArkLib/Data/CodingTheory/Basic.lean:2120)
    • have h_singleton_bound : ... := by (ArkLib/Data/CodingTheory/MDSCode.lean:92)
    • ... card_div {α : Type} [Fintype α] [Nonempty α] (P : α → Prop) [DecidablePred P] : ... = ... := by (ArkLib/Data/MvPolynomial/SchwartzZippelCounting.lean:65)
    • Rule: "Place by at the end of the line preceding the tactic block. Indent the tactic block."
  • Variable Conventions: Standardize names for sets and hypotheses. (11 violations)

    • (C : Set (ι → F)) (ArkLib/Data/CodingTheory/Basic.lean:1807) — Uses C for a Set.
    • obtain ⟨σ, hσ⟩ (ArkLib/Data/CodingTheory/MDSCode.lean:51) — Uses for a hypothesis.
    • (hdm : maxTotalDegree P ≤ minSeedCard S) (ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean:161) — Uses hdm for a hypothesis.
    • Rules: "s, t, ... : Sets or Lists"; "h, h₁, ... : Assumptions/Hypotheses".
  • Syntax and Formatting: Use fun x ↦ ... syntax. (12 violations)

    • ( fun i => G i ) (ArkLib/Data/CodingTheory/Basic.lean:2124)
    • fun x => Pr_{...} (ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean:168)
    • fun x => MvPolynomial.eval x f = 0 (ArkLib/Data/MvPolynomial/SchwartzZippelCounting.lean:36)
    • Rule: "Functions: Prefer fun x ↦ ... over λ x, ...." (Note: The diff uses => instead of the required symbol).
  • Syntax and Formatting: Line Length. (4 violations)

    • ArkLib/Data/CodingTheory/JohnsonBound/Basic.lean:201
    • ArkLib/Data/MvPolynomial/SchwartzZippelCounting.lean:43
    • Rule: "Line Length: Keep lines under 100 characters."
  • Formatting: Spaces and Delimiters. (8 violations)

    • ( fun i => G i ) (ArkLib/Data/CodingTheory/Basic.lean:2124) — Unnecessary parentheses and extra spaces.
    • simp +decide (ArkLib/Data/CodingTheory/Basic.lean:2126) — Missing spaces around operator +.
    • Rule: "Avoid parentheses where possible."; "Put spaces on both sides of :, :=, and infix operators."

📄 **Per-File Summaries**
  • ArkLib.lean: This update adds imports for new modules covering MDS codes, proximity generators, and Schwartz-Zippel counting for multivariate polynomials to the main library entry point. These additions expand the library's scope by integrating new definitions and theorems related to coding theory and polynomial properties.
  • ArkLib/Data/CodingTheory/JohnsonBound/Basic.lean: This Lean file update refines the formalization of the Johnson bound by adding descriptive documentation and significantly expanding the proofs for key results, such as the alphabet-free Johnson bound. The changes improve proof readability and rigor through more explicit subcase analysis and the consistent use of Fintype.card, all without introducing any sorry or admit placeholders.
  • ArkLib/Data/CodingTheory/MDSCode.lean: This file introduces the fundamental definitions and theorems for Maximum Distance Separable (MDS) codes and matrices. It establishes the key equivalence theorem that a linear code is MDS if and only if its generator matrix is MDS, and it contains no sorry placeholders.
  • ArkLib/Data/CodingTheory/Prelims.lean: This change introduces the linearCombination_ne_zero theorem, which proves that a non-zero linear combination of linearly independent vectors is itself non-zero. Additionally, it includes minor refactoring of imports, formatting, and variable declarations within the Matrix namespace. No sorry or admit placeholders were added.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/AffineSpaces.lean: This change modifies the signature of the correlatedAgreement_affine_spaces theorem by introducing an implicit parameter u representing a sequence of functions. The update refines the theorem's statement without introducing new definitions or sorry placeholders.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/Curves.lean: This change modifies the signature of the theorem correlatedAgreement_affine_curves by introducing an implicit parameter {u : Fin k → ι → F} representing the sequence of words. No new sorry or admit placeholders are introduced.
  • ArkLib/Data/CodingTheory/ProximityGap/Basic.lean: This PR refactors the definitions δ_ε_correlatedAgreementCurves and δ_ε_correlatedAgreementAffineSpaces to use higher-level geometric abstractions for their underlying probability distributions. Specifically, explicit summations are replaced with sampling from Curve.polynomialCurveFinite and affineSubspaceAtOrigin, streamlining the formal statements of correlated agreement. No sorry or admit placeholders were introduced.
  • ArkLib/Data/MvPolynomial/Degrees.lean: This change introduces the maxTotalDegree definition for finite families of multivariate polynomials along with lemmas for bounding the total degree of their linear combinations. It also adds a theorem relating the dot product of evaluated polynomials to the evaluation of their linear combination, with no sorry placeholders introduced.
  • ArkLib/Data/MvPolynomial/SchwartzZippelCounting.lean: This file introduces counting and probabilistic versions of the Schwartz-Zippel lemma for multivariate polynomials, deriving them from Mathlib’s existing schwartz_zippel_sup_sum theorem. The new theorems bound the number of roots and the probability of evaluating to zero over a uniform product distribution based on the polynomial's total degree and the size of the evaluation sets. No sorry placeholders are used in the proofs.
  • ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean: This file introduces fundamental definitions for proximity generators in coding theory, including zero-evading, polynomial, MDS, and mutual correlated agreement (MCA) generators. It also includes the theorem remark_3_20, which proves that polynomial generators satisfy the zero-evading property by applying the Schwartz-Zippel lemma. No sorry or admit placeholders are present in the code.
  • ArkLib/Data/CodingTheory/Basic.lean: This PR introduces definitions for word and code projections and formalizes the relationship between linear codes and their generator matrices, including proving the existence of a generator matrix and relating its rank to the code's dimension. It also adds a definition for Maximum Distance Separable (IsMDS) codes and incorporates several bibliographic references for these coding theory concepts. No sorry or admit placeholders were added.
  • ArkLib/Data/CodingTheory/ReedSolomon.lean: This change refactors the dimension theory of Reed-Solomon codes by simplifying the proof of dim_eq_deg_of_le' and removing redundant lemmas regarding cases where the degree exceeds the number of evaluation points. It also introduces the isMDS_code lemma, which formally proves that Reed-Solomon codes are maximum distance separable (MDS).

Last updated: 2026-04-13 18:22 UTC.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 26, 2026

Build Timing Report

  • Commit: aec49d9
  • Message: Merge 81adf53 into ccc68ba
  • Ref: Katy/ProximityGenerators
  • Comparison baseline: cf14a18 from the previous successful PR update.
  • Measured on ubuntu-latest with /usr/bin/time -p.
  • Commands: clean build rm -rf .lake/build && lake build; warm rebuild lake build; validation wrapper ./scripts/validate.sh.
Measurement Baseline (s) Current (s) Delta (s) Status
Clean build 631.65 663.69 +32.04 ok
Warm rebuild 4.37 4.28 -0.09 ok
Validation wrapper 4.14 4.00 -0.14 ok

Incremental Rebuild Signal

  • Warm rebuild saved 659.41s vs clean (155.07x faster).

This compares a clean project build against an incremental rebuild in the same CI job; it is a lightweight variability signal, not a full cross-run benchmark.

Slowest Current Clean-Build Files

Showing 20 slowest current targets, with comparison against the selected baseline when available.

Current (s) Baseline (s) Delta (s) Path
91.00 81.00 +10.00 ArkLib/Data/CodingTheory/JohnsonBound/Lemmas.lean
68.00 59.00 +9.00 ArkLib/Data/CodingTheory/Basic.lean
68.00 88.00 -20.00 ArkLib/ProofSystem/Fri/Spec/SingleRound.lean
67.00 65.00 +2.00 ArkLib/Data/CodingTheory/GuruswamiSudan/Basic.lean
57.00 47.00 +10.00 ArkLib/Data/CodingTheory/BerlekampWelch/Condition.lean
53.00 39.00 +14.00 ArkLib/OracleReduction/Security/RoundByRound.lean
53.00 51.00 +2.00 ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/AffineLines/BWMatrix.lean
45.00 44.00 +1.00 ArkLib/OracleReduction/LiftContext/Reduction.lean
37.00 33.00 +4.00 ArkLib/Data/CodingTheory/JohnsonBound/Basic.lean
34.00 18.00 +16.00 ArkLib/OracleReduction/Basic.lean
33.00 31.00 +2.00 ArkLib/Data/CodingTheory/DivergenceOfSets.lean
32.00 38.00 -6.00 ArkLib/Data/CodingTheory/ProximityGap/DG25.lean
30.00 46.00 -16.00 ArkLib/ProofSystem/Binius/BinaryBasefold/Prelude.lean
29.00 27.00 +2.00 ArkLib/ProofSystem/Fri/Domain.lean
28.00 27.00 +1.00 ArkLib/OracleReduction/ProtocolSpec/Basic.lean
27.00 29.00 -2.00 ArkLib/Data/CodingTheory/PolishchukSpielman/Resultant.lean
26.00 24.00 +2.00 ArkLib/Data/Hash/Poseidon2.lean
26.00 25.00 +1.00 ArkLib/OracleReduction/Composition/Sequential/Append.lean
25.00 14.00 +11.00 ArkLib/OracleReduction/ProtocolSpec/SeqCompose.lean
25.00 23.00 +2.00 ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/AffineLines/GoodCoeffs.lean

@github-actions
Copy link
Copy Markdown

🤖 Initial AI review without external context

🤖 AI Review

Overall Summary:

1. TL;DR

The PR establishes solid foundations for linear codes and proximity gaps, but requires structural updates due to a mathematically inaccurate MDS theorem statement, multiple critical type mismatches in the proximity generator definitions, and an incomplete sorry'd proof.

2. Checklist Coverage

No explicit specification checklist was provided for this PR. The review instead focused on strict mathematical fidelity, type correctness, Lean 4 / Mathlib idioms, and proof completeness.

3. Critical Misformalizations

  • MDS Property Misformalization (ArkLib/Data/CodingTheory/Basic.lean):
    The lemma colRank_genMatrix_eq_dim_of_MDS fails to capture the MDS property. Stating Matrix.colRank (genMatrixFromCode LC) = dim LC asserts a trivial fact true for all linear codes. Furthermore, the lemma lacks the IsMDS LC hypothesis entirely. To be mathematically correct, you must assume IsMDS LC and state that any submatrix formed by choosing exactly dim LC columns has full rank.
    Correction Example:

    lemma rank_submatrix_eq_dim_of_isMDS [Field F] [DecidableEq F] {LC : LinearCode ι F} 
        (hMDS : isMDS LC) (s : Finset ι) (hs : s.card = dim LC) :
        Matrix.rank ((genMatrixFromCode LC).submatrix id (fun x : s => x.val)) = dim LC := ...
  • Proximity Generator Type Mismatches (ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean):
    There are several compounding type errors in the generator definitions that must be resolved:

    • Heterogeneous Inequality: In Condition, Finset.card T ≥ (Fintype.card ι) * (1 - γ) attempts to compare with . Lean 4 will not implicitly coerce to here. Both integers must be explicitly cast: (T.card : ℝ) ≥ (Fintype.card ι : ℝ) * (1 - γ).
    • Subtype Evaluation Failure: In IsPolynomialGenerator, fun i ↦ x i returns a Subtype (S i), but MvPolynomial.eval expects F. Explicitly extract the value: fun i ↦ (x i : F).
    • Invalid Interval Types: In IsMCAGenerator, ε_mca : Set.Icc 0 1 → Set.Icc 0 1 is invalid. Bare integers in sets default to Nat, meaning this defines a function between sets of natural numbers, and it cannot be evaluated at the raw real γ. Use Mathlib's UnitInterval → UnitInterval (or ℝ → ℝ).
    • Probability Domain Mismatch: Also in IsMCAGenerator, comparing Pr_{...}[...] (ENNReal) directly to Real.toEReal (ε_mca γ) (EReal) fails because the types are disjoint. Cast the boundary using ENNReal.ofReal.

4. Key Lean 4 / Mathlib Issues

  • Incomplete Proofs [Hard Rule Violation] (1 file affected: Basic.lean):
    The proof for colRank_genMatrix_eq_dim_of_MDS is sorry'd. All partial proofs must be completed or properly scoped before the PR can be merged.
  • Submodule / SetLike Coercion Safety (2 files affected: Basic.lean, ProximityGenerators.lean):
    Avoid bypassing abstractions (e.g., using Code.dist LC.carrier) or relying on fragile implicit Submodule coercions for Sets. It is much safer and more idiomatic to explicitly cast linear codes to sets using (LC : Set (ι → F)) or ↑LC to prevent typeclass resolution failures in projection functions.
  • Minor Style & Naming Violations (Nitpicks):
    • Naming: Predicates should be lowerCamelCase (IsMDSisMDS). In ReedSolomon.lean, rename isMDS_code to isMDS_code' (or isMDS') to stay consistent with the primed lemma naming convention used in the rest of the file.
    • Universes: Generator (S ℓ F : Type) strictly limits the definition to Type 0. Generalize this idiomatically to Type*.
    • Redundancy: Remove redundant double parentheses in LinearCode.IsMDS ((ReedSolomon.code α n)) and drop the redundant namespace in SupSet.sSup (use sSup).

5. Overall Verdict

Changes Requested


📄 **Review for `ArkLib.lean`**

Verdict: Approved

Critical Misformalizations:
None

Lean 4 / Mathlib Issues:
None

Nitpicks:
None

📄 **Review for `ArkLib/Data/CodingTheory/Basic.lean`**

Verdict: Changes Requested

Critical Misformalizations:

  • colRank_genMatrix_eq_dim_of_MDS misformalizes the MDS property.
    The lemma statement Matrix.colRank (genMatrixFromCode LC) = dim LC asserts that the column rank of the entire generator matrix is equal to the code's dimension. Because row rank equals column rank and the rows are constructed directly from a basis of LC, this statement is a basic linear algebra fact that is trivially true for all linear codes, not just MDS codes.
    Additionally, the lemma completely lacks the assumption that the code is actually MDS. To correctly formalize the property described in the docstring ("any dim columns... are linearly independent"), you must require IsMDS LC and state that any submatrix formed by choosing exactly dim LC columns has full rank. For example:
    lemma rank_submatrix_eq_dim_of_isMDS [Field F] [DecidableEq F] {LC : LinearCode ι F} 
        (hMDS : IsMDS LC) (s : Finset ι) (hs : s.card = dim LC) :
        Matrix.rank ((genMatrixFromCode LC).submatrix id (fun x : s => x.val)) = dim LC := by
      sorry

Lean 4 / Mathlib Issues:

  • Hard Rule Violation: The proof for colRank_genMatrix_eq_dim_of_MDS is sorry'd. Incomplete proofs must be resolved before the PR can be approved.

Nitpicks:

  • In IsMDS, Code.dist LC.carrier directly accesses the .carrier field, breaking the SetLike abstraction. It is much more idiomatic to rely on the coercion to Set: Code.dist (LC : Set (ι → F)) or simply Code.dist ↑LC.
  • IsMDS is a Prop-valued def. According to Lean 4 / Mathlib naming conventions, non-class predicates should generally be lowerCamelCase. Consider renaming it to isMDS.
📄 **Review for `ArkLib/Data/CodingTheory/ProximityGap/ProximityGenerators.lean`**

Verdict: Changes Requested

Critical Misformalizations:

  • Heterogeneous Comparison in Condition: The expression Finset.card T ≥ (Fintype.card ι) * (1 - γ) attempts to compare a with a (since γ : ℝ). Lean 4 does not implicitly coerce to in inequalities, meaning this will result in a type mismatch. You must explicitly cast both integers to :
    (T.card : ℝ) ≥ (Fintype.card ι : ℝ) * (1 - γ)
  • Subtype Mismatch in IsPolynomialGenerator: The lambda fun i ↦ x i passed to MvPolynomial.eval results in a type error. Because S i is a Set F, x i evaluates to the subtype Subtype (S i). However, MvPolynomial.eval expects a function directly into F. You must explicitly extract the value:
    fun i ↦ (x i : F) or fun i ↦ (x i).val
  • Invalid Type for ε_mca in IsMCAGenerator: The parameter (ε_mca : Set.Icc 0 1 → Set.Icc 0 1) is fundamentally broken. When Set.Icc 0 1 is used as a type without context, 0 and 1 will default to Nat, meaning this specifies a function between sets of natural numbers. Furthermore, even if properly constrained to Reals, applying it directly to γ via ε_mca γ will fail because γ is bound as a raw in the binder, but ε_mca expects a Subtype. You should use either ℝ → ℝ or Mathlib's UnitInterval → UnitInterval.
  • Type Mismatch with EReal in IsMCAGenerator: The target probability evaluated by Pr_{...}[...] uses ENNReal (ℝ≥0∞). However, you are comparing it against Real.toEReal (ε_mca γ), which produces an EReal (ℝ ∪ {-∞, ∞}). ENNReal and EReal are disjoint types and cannot be directly compared with . You should cast the real boundary to ENNReal using ENNReal.ofReal.

Here is the corrected code for these definitions:

def Condition {S : Type} [Nonempty S] [Fintype S] (G : Generator S ℓ F) (U : ℓ → ι → F)
  (T : Finset ι) (γ : ℝ) (LC : LinearCode ι F) (x : S) : Prop :=
  let v := Matrix.vecMul (G x) (Matrix.of U)
  (T.card : ℝ) ≥ (Fintype.card ι : ℝ) * (1 - γ) ∧
  Code.projectedWord v T ∈ Code.projectedCode (LC : Set (ι → F)) T ∧
  ∃ j : ℓ, Code.projectedWord (U j) T ∉ Code.projectedCode (LC : Set (ι → F)) T

def IsPolynomialGenerator {s : ℕ} {S : Fin s → Set F} (G : Generator (∀ i, S i) ℓ F) : Prop :=
  ∃ P : ℓ → MvPolynomial (Fin s) F, LinearIndependent F P ∧
  ∀ x : (∀ i, S i), G x = MvPolynomial.eval (fun i ↦ (x i : F)) ∘ P

And for IsMCAGenerator, using the UnitInterval topology that you have already imported at the top of the file:

def IsMCAGenerator {S : Type} [Nonempty S] [Fintype S] (G : Generator S ℓ F)
  (ε_mca : UnitInterval → UnitInterval) (U : ℓ → ι → F) (T : Finset ι) (LC : LinearCode ι F) : Prop :=
  ∀ γ : UnitInterval,
  Pr_{let x ←$ᵖ S}[(Condition G U T (γ : ℝ) LC x) ] ≤ ENNReal.ofReal (ε_mca γ)

(Alternatively, you can type ε_mca as ℝ → ℝ and bind ∀ γ ∈ Set.Icc (0 : ℝ) 1.)

Lean 4 / Mathlib Issues:

  • Implicit Submodule Coercion: In Condition, LC has type LinearCode ι F (which is an alias for Submodule F (ι → F)). Code.projectedCode formally expects a Set (ι → F). While Lean's SetLike will sometimes implicitly coerce this, it is safer and standard practice to write (LC : Set (ι → F)) or LC.carrier to prevent typeclass resolution failures (as demonstrated in the corrected snippet above).

Nitpicks:

  • Universe Polymorphism: Generator (S ℓ F : Type) : Type limits the definition specifically to Type 0. It is more idiomatic to define it as Generator (S ℓ F : Type*) : Type*.
  • Redundant Namespace: SupSet.sSup can simply be written as sSup, which is globally available and conventionally preferred over prefixing the typeclass.
📄 **Review for `ArkLib/Data/CodingTheory/ReedSolomon.lean`**

Verdict: Needs Minor Revisions

Critical Misformalizations:
None

Lean 4 / Mathlib Issues:
None

Nitpicks:

  • Redundant Parentheses: In the theorem statement, the double parentheses around the code are unnecessary. LinearCode.IsMDS ((ReedSolomon.code α n)) can be simplified to LinearCode.IsMDS (ReedSolomon.code α n).
  • Naming Consistency: Throughout ArkLib/Data/CodingTheory/ReedSolomon.lean, lemmas that generalize properties from a Fin m index to an arbitrary finite index type ι are consistently named with a ' (prime) suffix (e.g., minDist', dim_eq_deg_of_le', uniqueDecodingRadius_RS_eq'). To align with the internal conventions of this file, consider renaming isMDS_code to isMDS_code' (or simply isMDS').
  • Variable Bindings: The variable n is implicitly resolved via the variable {deg m n : ℕ} block declared earlier in the file. While this perfectly type-checks (and was done similarly for minDist'), it is generally slightly better for readability to explicitly declare {n : ℕ} in the theorem signature as was done in dist_eq'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant